install.packages("tidyverse") # packages for data wrangling and plotting including dplyr, readr, tidyr, stringr, readxl, ggplot2, and othersThe basics of ggplot2
notes to self
have 2-3 plot with these features
- plot with text labels that need adjusting
- unedited axes labels
- points with different colors
- background color
- hline for editing
show them how to save as pdf
send above list to jenna
Workshop materials
Workshop materials can be found at https://github.com/vancleve/data2design_uk.
Download R and RStudio
If you haven’t already downloaded and installed R and RStudio, please do so now.
R
Rstudio
Getting a Project going in RStudio
RStudio cheatsheet
Create a new project in RStudio
Using a “project” allows us to keep all our files in one place and helps RStudio understand which files belong to which project
- Go to
File->New Project->New directory->New project - Select an easy to find directory
- Name your project something descriptive like “data2design_ggplot_tutorial”.
- Save this
qmdfile into your project directory.
Go to https://github.com/vancleve/data2design_uk/raw/refs/heads/main/ggplot_tutorial.qmd andSave As...in your browser.
Packages
Packages are groups of different functions or “actions” that allows us to do special tasks. People write packages to help others with reproducibility of analyses.
We will be working with the plotting package ggplot2 and a set of packages for data wrangling called tidyverse (https://www.tidyverse.org/). By installing just tidyverse, we can get all those packages at once.
Once installed, packages must be loaded into our “environment” (it is like bringing a pen if you are planning to write, not only buying it!). To load them we use the function library()
library(tidyverse)── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.0.4
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readxl)Reading data
The function read_excel from the package readxl allows us to read excel files and read_csv allows us to read csv (comma separated value) files. We will first load a dataset from Onstein et al. (2019)1 frugivory‐related traits and dispersal in the custard apple family. In the code below, we’ll load the data directly from Dryad (https://datadryad.org/dataset/doi:10.5061/dryad.2hd8b0s), which is a repository for archiving datasets, by first downloading the xlsx file and then loading it with read_excel. We’ll also simplify some of the column names too for clarity and turn some of the columns from log values into normal values.
tf = tempfile(fileext = ".xlsx")
curl::curl_download("https://datadryad.org/downloads/file_stream/82494", tf)
fruits <-
read_excel(tf, sheet = "Matrix for analysis", na = "NA") |>
rename(Taxon_tree = Species_tree, Taxon = Species_PROTEUS) |>
mutate(across(starts_with("Log_"), exp, .names = "Exp_{col}")) |>
rename_with(\(x) gsub("Exp_Log_", "", x, fixed = TRUE))
## Check what is inside the dataset, was it read fine?
fruits# A tibble: 228 × 27
Taxon_tree Taxon Log_Fruit_length_avg Log_Fruit_width_avg
<chr> <chr> <dbl> <dbl>
1 Alphonsea_boniana_JB57 Alphon… NA NA
2 Alphonsea_elliptica_JB34 Alphon… 0.763 0.580
3 Alphonsea_kinabaluensis_JB85 Alphon… 0.415 0.279
4 Ambavia_gerrardii_PB Ambavi… 0.301 0.114
5 Anaxagorea_javanica_JB Anaxag… 0.544 0.544
6 Anaxagorea_luzonensis_JB Anaxag… 0.415 -0.222
7 Anaxagorea_phaeocarpa_0498 Anaxag… 0.505 0.301
8 Anaxagorea_silvatica_0113 Anaxag… 0.176 0.0414
9 Annickia_chlorantha_0976 Annick… 0.114 -0.155
10 Annickia_kummeriae_MWC7004 Annick… 0.342 0.0414
# ℹ 218 more rows
# ℹ 23 more variables: Log_Fruit_number_avg <dbl>, Log_Stipe_length_avg <dbl>,
# Log_Seed_length_avg <dbl>, Log_Seed_width_avg <dbl>,
# Log_Seed_number_avg <dbl>, Log_Height_max <dbl>, Bright <chr>,
# Pubescent <chr>, Defence <chr>, Fruit_type <chr>, Cauliflory <chr>,
# Moniliform <chr>, Dehiscence <chr>, Shrub <chr>, Conspicuousness <chr>,
# Fruit_length_avg <dbl>, Fruit_width_avg <dbl>, Fruit_number_avg <dbl>, …
Plotting and understanding our data
Basics of ggplot2: bar plots
The ggplot2 package is built on the idea that graphics have can have a “grammar”, or set or rules, that specifies how they can and should be constructed. Implementing these rules not only makes creating graphics easier, but it makes such graphics consistent and clear. Hadley Wickham, the creator of ggplot2, borrows this idea from the book, “The Grammar of Graphics”” by Wilkinson, Anand, and Grossman (2005)2. While this structure may seem a bit artificial at first, it makes creating graphics very modular and building up complex graphics much easier.
Our first plot will be one of the simplest, which is a bar plot. Let’s plot the number of species by their Fruit_type.
ggplot(fruits) +
geom_bar(aes(x = Fruit_type))The above ggplot command has two pieces. The first is a call to ggplot with the name of the data table. This command by itself creates a blank canvas onto which we can plot using data from the data table. The second piece is to “add” a layer to this canvas with +, and that layer is a bar plot. The geom part is short for geometry since all the graphical elements of different plot types are made up of geometric pieces. In the argument to geom_bar, we give an “aesthetic mapping” with aes(x = Fruit_type), which says we want the x-axis to map to the Fruit_type column of our data. Finally, geom_bar builds bars automatically whose height is given by the number of rows in the data table with each x value.
This plot is a bit boring but we can spice it up with another aesthetic. Let’s try the color of the bar, which is known as the “fill”. We’ll set the fill to the Cauliflory variable.
ggplot(fruits) +
geom_bar(aes(x = Fruit_type, fill = Cauliflory))ggplot is smart here and automatically adds a legend for the color aethetic since you otherwise wouldn’t know which color mapped to which value of Dehiscence. In fact, ggplot also automatically added the tick labels for the same reason for the x-axis aesthetic.
Grids of plots
Looking at our dataset, there are a few other fruit variables we can look at like Dehiscence and Moniliform. What if we wanted to see if how the number of each Fruit_type varies as a function of different values of Dehiscence and Moniliform? One way to visualize this would be to replicate the bar plot above for each combination of Dehiscence and Moniliform. It turns out that ggplot let’s us do this very easily by adding a facet_wrap layer onto our bar plot.
ggplot(fruits) +
geom_bar(aes(x = Fruit_type, fill = Cauliflory)) +
facet_wrap(vars(Dehiscence, Moniliform))What can we observe from this?
Notice that the y-axis scales are all the same. This is on purpose and allows us to compare the data across the panels of the plot. However, we miss the variation in the panels with fewer observations. To allow the y-axis scales to adjust in each panel, we can add scales = "free_y" to facet_wrap.
ggplot(fruits) +
geom_bar(aes(x = Fruit_type, fill = Cauliflory)) +
facet_wrap(vars(Dehiscence, Moniliform), scales = "free_y")Scatter plots
When do we use scatter plots? Scatter plots are great for looking at how variables are correlated to one another and for seeing the full distribution of the data since you typically plot every single point.
Let’s make our first basic scatter plot. Notice that we put the aes command as an argument to the ggplot command. This works just as well as putting it in the geom_point command except that we can add on additional geometries without having to repeat the aes command.
ggplot(fruits, aes(x=Fruit_length_avg, y=Fruit_width_avg)) +
geom_point()Warning: Removed 24 rows containing missing values or values outside the scale range
(`geom_point()`).
What do we notice?
Now let’s add a regression (linear model) line through the points.
ggplot(fruits, aes(x = Fruit_length_avg, y = Fruit_width_avg)) +
geom_point() +
geom_smooth(method="lm")`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 24 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 24 rows containing missing values or values outside the scale range
(`geom_point()`).
The geom_smooth function is what adds the linear regression line. Note here we have to add method="lm" to tell geom_smooth to add a straight line (i.e., linear); otherwise, it defaults to a more complex method that fits a smooth curve.
ggplot(fruits, aes(x = Fruit_length_avg, y = Fruit_width_avg)) +
geom_point() +
geom_smooth()`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Warning: Removed 24 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 24 rows containing missing values or values outside the scale range
(`geom_point()`).
By adding a color aesthetic, the points are colored based on the value of the variable we map to color. Here, let’s map Shrub to color and keep the linear regression.
ggplot(fruits, aes(x = Fruit_length_avg, y = Fruit_width_avg, color = Shrub)) +
geom_point() +
geom_smooth(method = lm, se = FALSE, fullrange = TRUE)`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 24 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 24 rows containing missing values or values outside the scale range
(`geom_point()`).
Are groups different?
Volcano plots
Scatter plots can be used to visualize all kinds of data and are especially population in genomics data. For example, differential expression analyses using RNA-seq are very common and a “volcano plot” is often used to show which genes have increased and which have decreased expression in an experiment. We’ll use a dataset common in RNA-seq differential expression tutorials3, airway, which comes from an RNA-Seq experiment on four human airway smooth muscle cell lines treated with dexamethasone4. The data from airway can be loaded from the workshop GitHub site.
airway_de = read_csv("https://raw.githubusercontent.com/vancleve/data2design_uk/refs/heads/main/airway_de_transcript.csv")Rows: 15926 Columns: 19
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (7): .feature, albut, gene_id, gene_name, gene_biotype, seq_name, symbol
dbl (8): gene_seq_start, gene_seq_end, seq_strand, logFC, logCPM, F, PValue,...
lgl (4): entrezid, seq_coord_system, GRangesList, .abundant
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
airway_de# A tibble: 15,926 × 19
.feature albut gene_id gene_name entrezid gene_biotype gene_seq_start
<chr> <chr> <chr> <chr> <lgl> <chr> <dbl>
1 ENSG00000000003 untrt ENSG000… TSPAN6 NA protein_cod… 99883667
2 ENSG00000000419 untrt ENSG000… DPM1 NA protein_cod… 49551404
3 ENSG00000000457 untrt ENSG000… SCYL3 NA protein_cod… 169818772
4 ENSG00000000460 untrt ENSG000… C1orf112 NA protein_cod… 169631245
5 ENSG00000000971 untrt ENSG000… CFH NA protein_cod… 196621008
6 ENSG00000001036 untrt ENSG000… FUCA2 NA protein_cod… 143815948
7 ENSG00000001084 untrt ENSG000… GCLC NA protein_cod… 53362139
8 ENSG00000001167 untrt ENSG000… NFYA NA protein_cod… 41040684
9 ENSG00000001460 untrt ENSG000… STPG1 NA protein_cod… 24683489
10 ENSG00000001461 untrt ENSG000… NIPAL3 NA protein_cod… 24742284
# ℹ 15,916 more rows
# ℹ 12 more variables: gene_seq_end <dbl>, seq_name <chr>, seq_strand <dbl>,
# seq_coord_system <lgl>, symbol <chr>, GRangesList <lgl>, .abundant <lgl>,
# logFC <dbl>, logCPM <dbl>, F <dbl>, PValue <dbl>, FDR <dbl>
Each row of the table is gene whose expression was measured in the experiment, logFC is the log2 of the ratio of the expression in the treatment vs the control, PValue is the significance of the comparison, and FDR is false discovery rate.
Plotting the volcano plot is as simple as plotting the logFC on the x-axis and the -log10(Pvalue) on the y-axis. We can color the points by those with a FDR < 0.05.
ggplot(airway_de, aes(x = logFC, y = -log10(PValue), colour = FDR < 0.05)) +
geom_point()What if we want to color points as well by those with a greater than 2 fold change in expression up or down? Then we can do a little wrangling to create a new signficance column and use that for the color.
airway_de_sig = airway_de |>
mutate(significance =
case_when(
FDR < 0.05 & abs(logFC) >= 2 ~ "significant FDR and FC",
FDR < 0.05 & abs(logFC) < 2 ~ "significant FDR",
FDR > 0.05 & abs(logFC) >= 2 ~ "significant FC",
.default = "non-significant"
))
ggplot(airway_de_sig, aes(x = logFC, y = -log10(PValue), colour = significance)) +
geom_point()Finally, what about labeling some of the genes with the most significant (i.e., lowest) p-values? To do this, we need a little more data wrangling and a new geometry called geom_text that places a label on top of a data point.
airway_de_sig_lbl =
airway_de_sig |>
mutate(label = ifelse(PValue < 5e-9, gene_name, ""))
plt = ggplot(airway_de_sig_lbl, aes(x = logFC, y = -log10(PValue))) +
geom_point(aes(colour = significance)) +
geom_text(aes(label = label))
pltThis looks great except the labels are right on top of the points and they overlap. We can fix this in Illustrator! Let’s save the plot to a PDF so we can load it into illustrator later.
ggsave("airway_de_plot.pdf", plt, width = 8, height = 5)Plotting distributions
Often we are interested in understanding how often observations of a certain size appear in our data. For example, how many species have fruit of a certain length? To answer questions like this, we need to visualize the distribution of the data.
Questions about distributions often come up when our variables are contrinous or metric, which means that we can add and subtract values of the variable and create summaries of the variable like its mean value.
Histograms
One of the most common ways of visualizing a distribution is with a histogram, which groups observations into bins (or cajitas) to show us which kinds of observations are more frequent.
Let’s make a basic histogram!
ggplot(fruits, aes(x=Fruit_length_avg)) + geom_histogram()`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 15 rows containing non-finite outside the scale range
(`stat_bin()`).
Let’s interpret it!
Making histograms pretty and more useful
Changing colors in and out and saving it
p_histogram<- ggplot(fruits, aes(x=Fruit_length_avg))+ geom_histogram(color="darkblue", fill="hotpink")
p_histogram`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 15 rows containing non-finite outside the scale range
(`stat_bin()`).
What if we want to separate the data by the type of plant? and we have a variable Shrub in the dataset that tells us which one is a shrub or not a shrub
p_histogram<-ggplot(fruits, aes(x=Fruit_length_avg, color=Shrub)) +
geom_histogram(fill="white")
p_histogram`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 15 rows containing non-finite outside the scale range
(`stat_bin()`).
What if we want to add the mean of each group?
## Calculating the mean per group using Dplyr
## we are naming the mean of each group mean_fl (as a short for mean fruit length)
mu<-fruits %>% group_by(Shrub) %>%
summarize(mean_fl=mean(Fruit_length_avg, na.rm=TRUE),
.groups = 'drop')
mu# A tibble: 3 × 2
Shrub mean_fl
<chr> <dbl>
1 not_shrub 1.62
2 shrub 1.56
3 <NA> NaN
The little symbol %>% are called pipes in this package. Piping is an important tool for reproducibility. You never change your original data, you only reorder internally and put through a pipe line
Ok, so what is happening with the dimension of mu? what do you notice?
dim(mu)[1] 3 2
### (there are NAs this will be annoying later on, so we are going to remove)
mu<-mu[2:3,] ## discuss here how R reads tables and matrices as rows and columns!!!!
# another option mu<- mu[-1,]Okay so now finally add the mean to the groups and the histogram
p_histogram<-ggplot(fruits, aes(x=Fruit_length_avg, color=Shrub)) +
geom_histogram(fill="white", position="dodge")+
geom_vline(data=mu, aes(xintercept=mean_fl, color=Shrub),
linetype="dashed")+
theme(legend.position="top")
p_histogram`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 15 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_vline()`).
What if I want my own colors? Let’s check some cool options (aka coolors)
p_histogram<- p_histogram+scale_color_manual(values=c("hotpink", "#56B4E9")) #EXPLAIN
p_histogram`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 15 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_vline()`).
But I want to make it publication quality! Let’s make nice labels on the axes and clean the background
p_histogram<- p_histogram+xlab("Average Fruit Length")
p_histogram`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 15 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_vline()`).
p_histogram<-p_histogram+theme_classic()
p_histogram`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 15 rows containing non-finite outside the scale range (`stat_bin()`).
Removed 1 row containing missing values or values outside the scale range
(`geom_vline()`).
Density plots
What are density plots?- They are areas under the curve and later they will help us to decide about probability!
Let’s make a simple one!
p_density<-ggplot(fruits, aes(x=Fruit_length_avg)) + geom_density()
p_densityWarning: Removed 15 rows containing non-finite outside the scale range
(`stat_density()`).
Make it pretty colors
p_density<-ggplot(fruits, aes(x=Fruit_length_avg))+
geom_density(color="darkblue", fill="lightblue")
p_densityWarning: Removed 15 rows containing non-finite outside the scale range
(`stat_density()`).
Remember those pesky NAs what happens in density plots if we remove them. For that we will use the function subset().
p_density<-ggplot(data=subset(fruits,!is.na(Fruit_length_avg)), aes(x=Fruit_length_avg)) + geom_density()
p_densityWe will add the mean and remember to remove NAs (do one with and one without NAs)
mean.fruit=mean(fruits$Fruit_length_avg,na.rm=TRUE)
mean.fruit[1] 1.596013
p_density<-p_density+ geom_vline(aes(xintercept=mean.fruit),
color="blue", linetype="dashed", size=1)Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
p_densityChange density plot line colors by groups
p_density<-ggplot(fruits, aes(x=Fruit_length_avg, color=Shrub)) +
geom_density() + geom_vline(data=mu, aes(xintercept=mean_fl, color=Shrub),
linetype="dashed")
p_densityWarning: Removed 15 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_vline()`).
# Fill them in
p_density<-ggplot(fruits, aes(x=Fruit_length_avg, fill=Shrub)) + geom_density(alpha=0.4)+ geom_vline(data=mu, aes(xintercept=mean_fl, color=Shrub), linetype="dashed")
p_densityWarning: Removed 15 rows containing non-finite outside the scale range
(`stat_density()`).
Removed 1 row containing missing values or values outside the scale range
(`geom_vline()`).
Adding my color scheme (tolk about coolors here)
p_density<-p_density+scale_fill_manual(values=c("hotpink", "#56B4E9"))
p_densityWarning: Removed 15 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_vline()`).
Making it publication style
p_density<- p_density+ xlab("Average Fruit Length")+theme_classic()
p_densityWarning: Removed 15 rows containing non-finite outside the scale range
(`stat_density()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_vline()`).
Histogram with density plot
p_histdensity<-ggplot(fruits, aes(x=Fruit_length_avg)) +
geom_histogram(aes(y=..density..), colour="black", fill="white")+
geom_density(alpha=.2, fill="hotpink")
p_histdensityWarning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Warning: Removed 15 rows containing non-finite outside the scale range
(`stat_bin()`).
Warning: Removed 15 rows containing non-finite outside the scale range
(`stat_density()`).
Making Box and Whisker plots
What are box and whisker plots? What are quartiles and interquartile difference??
ggplot(data=fruits, aes(x=Fruit_length_avg)) +
geom_boxplot()Warning: Removed 15 rows containing non-finite outside the scale range
(`stat_boxplot()`).
How do we make it vertical???
p_boxplot<-ggplot(data=fruits, aes(y=Fruit_length_avg)) +
geom_boxplot()
p_boxplotWarning: Removed 15 rows containing non-finite outside the scale range
(`stat_boxplot()`).
How do we make it box and whisker plots by group??
p_boxplot<-ggplot(fruits, aes(y=Fruit_length_avg, color=Shrub)) +
geom_boxplot()
p_boxplotWarning: Removed 15 rows containing non-finite outside the scale range
(`stat_boxplot()`).
How do we change the color?
p_boxplot<-ggplot(fruits, aes(x=Shrub,y=Fruit_length_avg, color=Shrub)) +
geom_boxplot()+scale_color_manual(values=c("hotpink", "#56B4E9"))
p_boxplotWarning: Removed 15 rows containing non-finite outside the scale range
(`stat_boxplot()`).
How do we fill them in
p2<-ggplot(fruits, aes(x=Shrub,y=Fruit_length_avg, fill=Shrub)) +
geom_boxplot()
p2+scale_fill_manual(values=c("hotpink", "#56B4E9"))Warning: Removed 15 rows containing non-finite outside the scale range
(`stat_boxplot()`).
Eliminating those pesky NAs from plots
p_boxplot<-ggplot(data=subset(fruits, !is.na(Fruit_length_avg)), aes(x=Shrub,y=Fruit_length_avg, color=Shrub)) +
geom_boxplot()+scale_color_manual(values=c("hotpink", "#56B4E9"))
p_boxplotHow do we add the mean?
p_boxplot<-p_boxplot + geom_point(data=mu,aes(x=Shrub,y=mean_fl),shape=23, size=4)
p_boxplotWarning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).
Adding good labels
p_boxplot<-p_boxplot+xlab("Shrub status")+ylab("Average fruit length")
p_boxplotWarning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).
How do we add all the sample?
p_boxplot<-p_boxplot + geom_jitter(shape=16, position=position_jitter(0.2))
p_boxplotWarning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).
All pretty and ready for publication
p_boxplot<-p_boxplot+theme_classic()
p_boxplotWarning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).
Let’s load the libraries and the dataset
library(airway)
library(tidybulk)
# load airway RNA sequencing data
data(airway)
# convert to tidybulk tibble
tbl_airway <-
airway |>
tidybulk()tbl_filtered = tbl_airway |>
mutate(.sample=str_remove(.sample, "SRR1039")) |>
mutate(symbol = AnnotationDbi::mapIds(org.Hs.eg.db::org.Hs.eg.db, keys = as.character(.feature), keytype = "ENSEMBL", column="SYMBOL", multiVals = "first")) |>
keep_abundant(factor_of_interest=dex)
'select()' returned 1:many mapping between keys and columns
tbl_de = tbl_filtered |>
test_differential_abundance(
.formula = ~ 0 + dex + cell,
contrasts = c("dextrt - dexuntrt"),
omit_contrast_in_colnames = TRUE
)=====================================
tidybulk says: All testing methods use raw counts, irrespective of if scale_abundance
or adjust_abundance have been calculated. Therefore, it is essential to add covariates
such as batch effects (if applicable) in the formula.
=====================================
This message is displayed once per session.
tidybulk says: to access the raw results (fitted GLM) do `attr(..., "internals")$edgeR`
This message is displayed once per session.
tbl_transcript = tbl_de |> pivot_transcript()
tbl_transcript |>
ggplot(aes(x = logFC, y = PValue, colour = FDR < 0.05)) +
geom_point() +
scale_y_continuous(trans = "log10_reverse")tbl_transcript |> write_csv("airway_de_transcript.csv")Footnotes
Onstein, R. E., W. D. Kissling, L. W. Chatrou, T. L. P. Couvreur, H. Morlon, and H. Sauquet. 2019. Which frugivory-related traits facilitated historical long-distance dispersal in the custard apple family (Annonaceae)? Journal of Biogeography 46:1874–1888.↩︎
Wilkinson, L. 2005. The grammar of graphics. Statistics and computing. Springer New York.↩︎
https://stemangiola.github.io/rpharma2020_tidytranscriptomics/articles/tidytranscriptomics.html↩︎
Himes, B. E., X. Jiang, P. Wagner, R. Hu, Q. Wang, B. Klanderman, R. M. Whitaker, et al. 2014. RNA-Seq Transcriptome Profiling Identifies CRISPLD2 as a Glucocorticoid Responsive Gene that Modulates Cytokine Function in Airway Smooth Muscle Cells. PLOS ONE 9:e99625.↩︎